K-difference Matching in Amortized Linear Time for All the Words in a Text
نویسنده
چکیده
Given a text x of length n, we study the problem of solving the k-difference problem for all thewords, either with fixed or variable length, taken from the text itself. The result finds its application in pattern discovery in biosequences where overor under-represented words are extracted from the input sequences. The proposed algorithm runs in amortized linear time perword. This improves the complexity obtained by applyingwell-known algorithms to each of the O(n) fixed length words or O(n2) variable length words in x by factor of k, √ k log k, or √ m logm, depending on the chosen algorithm. The space required is O(n) if we just count the occurrences, or O(n2) if we also store the positions. This second scenario can be used as the basis for other applications, such as searching gapped factors with mismatches or approximate pattern matching extended to any word. © 2008 Elsevier B.V. All rights reserved.
منابع مشابه
An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملA New Structural Matching Method Based on Linear Features for High Resolution Satellite Images
Along with commercial accessibility of high resolution satellite images in recent decades, the issue of extracting accurate 3D spatial information in many fields became the centre of attention and applications related to photogrammetry and remote sensing has increased. To extract such information, the images should be geo-referenced. The procedure of georeferencing is done in four main steps...
متن کاملStreaming Pattern Matching with d Wildcards
In the pattern matching with d wildcards problem we are given a text T of length n and a pattern P of length m that contains d wildcard characters, each denoted by a special symbol ′?′. A wildcard character matches any other character. The goal is to establish for each m-length substring of T whether it matches P . In the streaming model variant of the pattern matching with d wildcards problem ...
متن کاملAn Improvement in Support Vector Machines Algorithm with Imperialism Competitive Algorithm for Text Documents Classification
Due to the exponential growth of electronic texts, their organization and management requires a tool to provide information and data in search of users in the shortest possible time. Thus, classification methods have become very important in recent years. In natural language processing and especially text processing, one of the most basic tasks is automatic text classification. Moreover, text ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Theor. Comput. Sci.
دوره 410 شماره
صفحات -
تاریخ انتشار 2009